Gene Onoltogy Analysis on iPSCs Heterozygous

1. Environment Set Up

for (i in 1:length(params))
  print(paste('Parameter:', names(params)[i], ' - Value:', params[[i]], '- Class:', class(params[[i]])))
## [1] "Parameter: Dataset  - Value: CHD2_iPSCs_and_organoids_PublicRepo - Class: character"
## [1] "Parameter: SEFile  - Value: /group/testa/Project/CHD2/BulkRNAseq/results/PublicRepo/2.DEA/iPSCs/Output/Savings/ipsc.SE_deseq2_HT.rds - Class: character"
## [1] "Parameter: DEAList  - Value: /group/testa/Project/CHD2/BulkRNAseq/results/PublicRepo/2.DEA/iPSCs/Output/Savings/ipsc.DEAList_HT.rds - Class: character"
## [1] "Parameter: HT  - Value: /group/testa/Project/CHD2/BulkRNAseq/results/PublicRepo/2.DEA/iPSCs/Output/Savings/ipsc.deseqHTvsWT.rds - Class: character"
## [1] "Parameter: SavingFolder  - Value: /group/testa/Project/CHD2/BulkRNAseq/results/PublicRepo/6.Enrichments/iPSCs/Output/Savings/ - Class: character"
## [1] "Parameter: FiguresFolder  - Value: /group/testa/Project/CHD2/BulkRNAseq/results/PublicRepo/6.Enrichments/iPSCs/Output/Figures/ - Class: character"
## [1] "Parameter: FDRthr  - Value: 0.05 - Class: numeric"
## [1] "Parameter: logFCthr  - Value: 0.55 - Class: numeric"
## [1] "Parameter: TopGO  - Value: BP_MF_CC - Class: character"
## [1] "Parameter: GoEnTh  - Value: 1 - Class: numeric"
## [1] "Parameter: GoPvalTh  - Value: 0.05 - Class: numeric"
## [1] "Parameter: NbName  - Value: TopGO_iPSCs_HT - Class: character"
## [1] "Parameter: SaveImages  - Value: FALSE - Class: logical"
  • Dataset: name of the dataset that is processed.
  • SE: input file containing differential expression results from DESeq2 (absolute path).
  • FDRthr: threshold on False Discovery Rate. Default 0.05.
  • logFCthr: threshold on False Discovery Rate. Default 1.5.
  • SavingFolder: directory where produced files will be written (absolute path). Default is getwd().
  • TopGO: string that specify the ontology domains to be analysed. Default BP and MF; also CC can be added.
library(RNASeqBulkExploratory)
library(SummarizedExperiment)
library(tidyr)
library(AnnotationDbi)
library(org.Hs.eg.db)
library(topGO)
library(sechm)
library(ggplot2)
library(grid)
library(gridExtra)
library(RColorBrewer)
library(cowplot)
source('/group/testa/Users/oliviero.leonardi/myProjects/CHD2/BulkRNAseq/ContainerHome/CHD2_organoids/NoGradientBarplots.R')
Dataset <- params$Dataset
logFCthr <- params$logFCthr
FDRthr <- params$FDRthr

FdrTh <- FDRthr
logFcTh <- logFCthr

SavingFolder <- ifelse(is.null(params$SavingFolder), getwd(), params$SavingFolder)
FiguresFolder <- ifelse(is.null(params$FiguresFolder), getwd(), params$FiguresFolder)


if (dir.exists(SavingFolder) == FALSE) {
  dir.create(SavingFolder, recursive=TRUE)
}

2. Data Upload

  • Summarized Experiment object containing expression data used for DEA and gene and sample metadata
  • DEA object, containing results of the differential expression

2.1 Load Data from DEA

#SE object coming from DEA, but not containing specific contrast results
SE_DEA <- readRDS(params$SEFile)

SE_DEA <- SE_DEA[rowData(SE_DEA)$GeneName != '', ]
rownames(SE_DEA) <- rowData(SE_DEA)$GeneName

# List with differential expression results (all time-points)
DEA <- readRDS(params$DEAList)
colvector <- c("#5ec962", "#e95462", "#2c728e")
names(colvector) <- c('All', 'Up',  'Down')

2.2 Add DEA results to SE

if(! identical(rownames(SE_DEA), row.names(DEA$HT$res))){
  stop('Expression data in SE and results from differential espression analysis are inconsistent.')
}

SE_DEA <- mergeDeaSE(SE_DEA, DEA$HT$res, subsetRowDataCols=NULL,
                     logFcCol='log2FoldChange', FdrCol='padj') #specify
## Renaming " log2FoldChange " to "logFC"
## Renaming "  padj  " to "FDR"

16981 genes in 9 samples have been testes for differential expression.

The following number of genes are identified as differentially expressed:

  • FDR < 0.1: 4550 differentially expressed genes
  • FDR < 0.05: 3649 differentially expressed genes
  • FDR < 0.05 and FC > 1.5: 2574 differentially expressed genes
  • FDR < 0.05 and FC > 2: 1639 differentially expressed genes

Imposing a threshold of 0.55 on the Log2FC and 0.05 on the FDR (as specified in parameters), 3649 genes are selected: 2700 up-regulated genes and 2707 down-regulated genes.


3. RESULTS NAVIGATION: Interactive Table

An interactive table show the results for all DEGs (ranked according to FDR). A table of all DEGs can be downloaded from here.

From topTag I generate an interactive table for result interrogation with link to the Gene Cards. The table reports all the genes having a FDR < 0.05 and a Log2FC > 0.55 as absolute value, according to the threshold settings.

DEGsTable(SE_DEA, FdrTh=FdrTh, logFcTh=logFCthr, maxGenes=Inf, saveDEGs=FALSE)

4. RESULTS VISUALIZATION

4.1 Volcano plot

The results of the differential expression analysis are visualized by Volcano plot. An interactive version is included in the html (only genes with FDR < threshold), while a static version is saved.

plotVolcanoSE(SE=SE_DEA, FdrTh=FDRthr, logFcTh=logFCthr, 
              FdrCeil=1e-10, logFcCeil=4, Interactive = FALSE)
## Warning: Removed 16781 rows containing missing values or values outside the scale range
## (`geom_text_repel()`).
## Warning: ggrepel: 113 unlabeled data points (too many overlaps). Consider increasing max.overlaps


plotVolcanoSE(SE=SE_DEA, FdrTh=FDRthr, logFcTh=logFCthr, 
              FdrCeil=1e-10, logFcCeil=4, Interactive = TRUE)

4.2 Heatmap for significant genes

Heatmaps for DEGs, showing scaled vst values.

DEGs <- dplyr::filter(data.frame(rowData(SE_DEA)), FDR < FDRthr & abs(logFC) > log2(logFCthr))
ScaledCols <- c('darkblue', "purple","white","lightgoldenrod1", 'goldenrod1')
# sechm::sechm(SE_DEA, genes=DEGs$GeneName, assayName="vst", gaps_at="Genotype", show_rownames=FALSE,
#       top_annotation=c('Genotype'), hmcols=ScaledCols, show_colnames=TRUE,
#       do.scale=TRUE, breaks=0.85, column_title = "Scaled Vst Values")

5. TOPGO for Gene Ontology Enrichment analysis

Gene ontology enrichment analysis is performed on the set of 3649 genes using TopGO with Fisher statistics and weight01 algorithm.

For each specified domain of the ontology:

  • Enrichment analysis on all DEGs or splitted in down- and up-regulated

5.1 Selection of modulated genes and generation of gene vectors

I generate vectors for the gene universe, all modulated genes, up-regulated genes and down-regulated genes in the format required by TopGo.

GeneVectors <- topGOGeneVectors(SE_DEA, FdrTh=FDRthr, logFcTh=logFCthr)
## Gene vector contains levels: 0,1
## Gene vector contains levels: 0,1
## Gene vector contains levels: 0,1

Therefore:

  • universe genes: 16981 genes
  • modulated genes: 2711 genes
  • down-regulated genes: 1325 genes of interest
  • up-regulated genes: 1386 genes of interest

Then I set parameters according to the gene ontology domains to be evaluated. By default, Biological Process and Molecular Function domains are interrogated.

BpEval <- ifelse(length(grep('BP', params$TopGO))!=0, TRUE, FALSE)
MfEval <- ifelse(length(grep('MF', params$TopGO))!=0, TRUE, FALSE)
CcEval <- ifelse(length(grep('CC', params$TopGO))!=0, TRUE, FALSE)

5.2 TopGO analysis: Biological Process

On the basis of the analysis settings, the enrichment for Biological Process IS performed.

Biological Process Analysis for ALL modulated genes: 2711 genes

# I generate a list that contains the association between each gene and the GO terms that are associated to it
BPannHT <- topGO::annFUN.org(whichOnto="BP", feasibleGenes=names(GeneVectors$DEGenes), 
                    mapping="org.Hs.eg.db", ID="symbol") %>% inverseList()

# Wrapper function for topGO analysis (see helper file)
ResBPAllHT <- topGOResults(Genes=GeneVectors$DEGenes, gene2GO=BPannHT, ontology='BP', 
                         desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                         EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                         saveRes=TRUE, fileName='BPAllHT', outDir=SavingFolder)
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 11678 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 15327 GO terms and 34854 relations. )
## 
## Annotating nodes ...............
##  ( 13748 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 7776 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 19:  3 nodes to be scored    (0 eliminated genes)
## 
##   Level 18:  9 nodes to be scored    (0 eliminated genes)
## 
##   Level 17:  12 nodes to be scored   (31 eliminated genes)
## 
##   Level 16:  31 nodes to be scored   (54 eliminated genes)
## 
##   Level 15:  74 nodes to be scored   (140 eliminated genes)
## 
##   Level 14:  142 nodes to be scored  (373 eliminated genes)
## 
##   Level 13:  236 nodes to be scored  (791 eliminated genes)
## 
##   Level 12:  384 nodes to be scored  (1829 eliminated genes)
## 
##   Level 11:  688 nodes to be scored  (3873 eliminated genes)
## 
##   Level 10:  998 nodes to be scored  (5939 eliminated genes)
## 
##   Level 9:   1202 nodes to be scored (7698 eliminated genes)
## 
##   Level 8:   1172 nodes to be scored (9592 eliminated genes)
## 
##   Level 7:   1067 nodes to be scored (11065 eliminated genes)
## 
##   Level 6:   854 nodes to be scored  (12166 eliminated genes)
## 
##   Level 5:   509 nodes to be scored  (12798 eliminated genes)
## 
##   Level 4:   262 nodes to be scored  (13276 eliminated genes)
## 
##   Level 3:   110 nodes to be scored  (13446 eliminated genes)
## 
##   Level 2:   22 nodes to be scored   (13536 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (13586 eliminated genes)

Biological Process Analysis for DOWN-REGULATED genes: 1325 genes

# Wrapper function for topGO analysis (see helper file)
ResBPDownHT <- topGOResults(Genes=GeneVectors$DEGenesDown, gene2GO=BPannHT, ontology='BP', 
                          desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                          EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                          saveRes=TRUE, fileName='BPDownHT', outDir=paste0(SavingFolder)) 
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 11678 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 15327 GO terms and 34854 relations. )
## 
## Annotating nodes ...............
##  ( 13748 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 6505 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 19:  1 nodes to be scored    (0 eliminated genes)
## 
##   Level 18:  8 nodes to be scored    (0 eliminated genes)
## 
##   Level 17:  11 nodes to be scored   (18 eliminated genes)
## 
##   Level 16:  23 nodes to be scored   (50 eliminated genes)
## 
##   Level 15:  58 nodes to be scored   (136 eliminated genes)
## 
##   Level 14:  109 nodes to be scored  (336 eliminated genes)
## 
##   Level 13:  165 nodes to be scored  (709 eliminated genes)
## 
##   Level 12:  282 nodes to be scored  (1657 eliminated genes)
## 
##   Level 11:  520 nodes to be scored  (3643 eliminated genes)
## 
##   Level 10:  794 nodes to be scored  (5779 eliminated genes)
## 
##   Level 9:   988 nodes to be scored  (7387 eliminated genes)
## 
##   Level 8:   1005 nodes to be scored (9350 eliminated genes)
## 
##   Level 7:   934 nodes to be scored  (10911 eliminated genes)
## 
##   Level 6:   763 nodes to be scored  (12051 eliminated genes)
## 
##   Level 5:   470 nodes to be scored  (12773 eliminated genes)
## 
##   Level 4:   247 nodes to be scored  (13266 eliminated genes)
## 
##   Level 3:   104 nodes to be scored  (13444 eliminated genes)
## 
##   Level 2:   22 nodes to be scored   (13536 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (13586 eliminated genes)

# Selection on enrichment of at least 2 is implemented (also to avoid depleted categories). Then categories are ranked by PVal and all the ones with Pval < th are selected. If the number is < minTerms, othter terms are included to reach the minimum number. 
GOTable(ResBPDownHT$ResSel, maxGO=20)

Biological Process Analysis for UP-REGULATED genes: 1386 genes

ResBPUpHT <- topGOResults(Genes=GeneVectors$DEGenesUp, gene2GO=BPannHT, ontology='BP', 
                        desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                        EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                        saveRes=TRUE, fileName='BPUpHT', outDir=SavingFolder) 
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 11678 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 15327 GO terms and 34854 relations. )
## 
## Annotating nodes ...............
##  ( 13748 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 6872 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 19:  3 nodes to be scored    (0 eliminated genes)
## 
##   Level 18:  9 nodes to be scored    (0 eliminated genes)
## 
##   Level 17:  12 nodes to be scored   (31 eliminated genes)
## 
##   Level 16:  26 nodes to be scored   (54 eliminated genes)
## 
##   Level 15:  59 nodes to be scored   (140 eliminated genes)
## 
##   Level 14:  118 nodes to be scored  (310 eliminated genes)
## 
##   Level 13:  205 nodes to be scored  (688 eliminated genes)
## 
##   Level 12:  323 nodes to be scored  (1672 eliminated genes)
## 
##   Level 11:  590 nodes to be scored  (3743 eliminated genes)
## 
##   Level 10:  863 nodes to be scored  (5767 eliminated genes)
## 
##   Level 9:   1050 nodes to be scored (7438 eliminated genes)
## 
##   Level 8:   1038 nodes to be scored (9383 eliminated genes)
## 
##   Level 7:   951 nodes to be scored  (10899 eliminated genes)
## 
##   Level 6:   779 nodes to be scored  (12083 eliminated genes)
## 
##   Level 5:   468 nodes to be scored  (12775 eliminated genes)
## 
##   Level 4:   246 nodes to be scored  (13262 eliminated genes)
## 
##   Level 3:   109 nodes to be scored  (13445 eliminated genes)
## 
##   Level 2:   22 nodes to be scored   (13535 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (13586 eliminated genes)

#dir.create(paste0(SavingFolder, 'TopGO/BPUp'), recursive=TRUE)
#GOAnnotation(ResBPUp$ResSel, GOdata=ResBPUp$GOdata, SavingFolder=paste0(SavingFolder, 'TopGO/BPUp'), keytype='SYMBOL')
GOTable(ResBPUpHT$ResSel, maxGO=20)

Result visualization: Barplot

topGOBarplotAll(TopGOResAll=ResBPAllHT$ResSel, TopGOResDown=ResBPDownHT$ResSel, TopGOResUp = ResBPUpHT$ResSel,
                terms=12, pvalTh=0.05, plotTitle=NULL, gradient = FALSE, cols = colvector)
## Warning in topGOBarplotAll(TopGOResAll = ResBPAllHT$ResSel, TopGOResDown = ResBPDownHT$ResSel, : Please make sure you specified names of the cols vector correctly.
##                 See `?topGOBarplotAll`

5.3 TopGO analysis: Molecular Function

On the basis of the analysis settings, the enrichment for Molecular Function IS performed.

Molecular Function Enrichment for ALL modulated genes: 2711 genes

# I generate a list that contains the association between each gene and the GO terms that are associated to it
MFannHT <- topGO::annFUN.org(whichOnto='MF', feasibleGenes=names(GeneVectors$DEGenes), 
                           mapping='org.Hs.eg.db', ID='symbol') %>% inverseList()

# Wrapper function for topGO analysis (see helper file)
ResMFAllHT <- topGOResults(Genes=GeneVectors$DEGenes, gene2GO=MFannHT, ontology='MF', 
                         desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                         EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                         saveRes=TRUE, fileName='MFAllHT', outDir=SavingFolder) 
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 4161 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 4624 GO terms and 5978 relations. )
## 
## Annotating nodes ...............
##  ( 14167 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 1479 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 12:  5 nodes to be scored    (0 eliminated genes)
## 
##   Level 11:  19 nodes to be scored   (0 eliminated genes)
## 
##   Level 10:  37 nodes to be scored   (36 eliminated genes)
## 
##   Level 9:   77 nodes to be scored   (214 eliminated genes)
## 
##   Level 8:   143 nodes to be scored  (1429 eliminated genes)
## 
##   Level 7:   249 nodes to be scored  (3539 eliminated genes)
## 
##   Level 6:   307 nodes to be scored  (4422 eliminated genes)
## 
##   Level 5:   320 nodes to be scored  (6435 eliminated genes)
## 
##   Level 4:   238 nodes to be scored  (9301 eliminated genes)
## 
##   Level 3:   67 nodes to be scored   (11486 eliminated genes)
## 
##   Level 2:   16 nodes to be scored   (12217 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (14033 eliminated genes)

Molecular Function Enrichment for DOWN-REGULATED genes: 1325 genes

ResMFDownHT <- topGOResults(Genes=GeneVectors$DEGenesDown, gene2GO=MFannHT, ontology='MF', 
                          desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                          EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                          saveRes=TRUE, fileName='MFDownHT', outDir=SavingFolder) 
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 4161 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 4624 GO terms and 5978 relations. )
## 
## Annotating nodes ...............
##  ( 14167 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 1241 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 12:  3 nodes to be scored    (0 eliminated genes)
## 
##   Level 11:  16 nodes to be scored   (0 eliminated genes)
## 
##   Level 10:  29 nodes to be scored   (22 eliminated genes)
## 
##   Level 9:   58 nodes to be scored   (207 eliminated genes)
## 
##   Level 8:   113 nodes to be scored  (1373 eliminated genes)
## 
##   Level 7:   197 nodes to be scored  (3435 eliminated genes)
## 
##   Level 6:   256 nodes to be scored  (4250 eliminated genes)
## 
##   Level 5:   283 nodes to be scored  (6122 eliminated genes)
## 
##   Level 4:   207 nodes to be scored  (9113 eliminated genes)
## 
##   Level 3:   62 nodes to be scored   (11423 eliminated genes)
## 
##   Level 2:   16 nodes to be scored   (12178 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (14030 eliminated genes)

#dir.create(paste0(SavingFolder, 'TopGO/MFDown'), recursive=TRUE)
#GOAnnotation(ResMFDown$ResSel, GOdata=ResMFDown$GOdata, SavingFolder=paste0(SavingFolder, 'TopGO/MFDown'), keytype='SYMBOL')
GOTable(ResMFDownHT$ResSel, maxGO=20)

Molecular Function Analysis for UP-REGULATED genes: 1386 genes

ResMFUpHT <- topGOResults(Genes=GeneVectors$DEGenesUp, gene2GO=MFannHT, ontology='MF', 
                        desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                        EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                        saveRes=TRUE, fileName='MFUpHT', outDir=SavingFolder)
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 4161 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 4624 GO terms and 5978 relations. )
## 
## Annotating nodes ...............
##  ( 14167 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 1215 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 12:  4 nodes to be scored    (0 eliminated genes)
## 
##   Level 11:  15 nodes to be scored   (0 eliminated genes)
## 
##   Level 10:  26 nodes to be scored   (30 eliminated genes)
## 
##   Level 9:   59 nodes to be scored   (199 eliminated genes)
## 
##   Level 8:   108 nodes to be scored  (1334 eliminated genes)
## 
##   Level 7:   199 nodes to be scored  (3426 eliminated genes)
## 
##   Level 6:   260 nodes to be scored  (4255 eliminated genes)
## 
##   Level 5:   262 nodes to be scored  (6195 eliminated genes)
## 
##   Level 4:   207 nodes to be scored  (9088 eliminated genes)
## 
##   Level 3:   58 nodes to be scored   (11413 eliminated genes)
## 
##   Level 2:   16 nodes to be scored   (12181 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (14031 eliminated genes)

#dir.create(paste0(SavingFolder, 'TopGO/MFUp'), recursive=TRUE)
#GOAnnotation(ResMFUp$ResSel, GOdata=ResMFUp$GOdata, SavingFolder=paste0(SavingFolder, 'TopGO/MFUp'), keytype='SYMBOL')
GOTable(ResMFUpHT$ResSel, maxGO=20)

Result visualization: Barplot

topGOBarplotAll(TopGOResAll=ResMFAllHT$ResSel, TopGOResDown=ResMFDownHT$ResSel, TopGOResUp=ResMFUpHT$ResSel, 
                terms=12, pvalTh=0.05, plotTitle=NULL, gradient = FALSE, cols = colvector)
## Warning in topGOBarplotAll(TopGOResAll = ResMFAllHT$ResSel, TopGOResDown = ResMFDownHT$ResSel, : Please make sure you specified names of the cols vector correctly.
##                 See `?topGOBarplotAll`

5.4 TopGO analysis: Cellular Component

On the basis of the analysis settings, the enrichment for Cellular Component IS performed.

Cellular Component Enrichment for ALL modulated genes: 2711 genes

# I generate a list that contains the association between each gene and the GO terms that are associated to it
CCannHT <- topGO::annFUN.org(whichOnto='CC', feasibleGenes=names(GeneVectors$DEGenes), 
                           mapping='org.Hs.eg.db', ID='symbol') %>% inverseList()

# Wrapper function for topGO analysis (see helper file)
ResCCAllHT <- topGOResults(Genes=GeneVectors$DEGenes, gene2GO=CCannHT, ontology='CC', 
                         desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                         EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                         saveRes=TRUE, fileName='CCAllHT', outDir=SavingFolder)
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 1701 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 1923 GO terms and 3234 relations. )
## 
## Annotating nodes ...............
##  ( 14409 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 850 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 13:  3 nodes to be scored    (0 eliminated genes)
## 
##   Level 12:  8 nodes to be scored    (0 eliminated genes)
## 
##   Level 11:  38 nodes to be scored   (52 eliminated genes)
## 
##   Level 10:  90 nodes to be scored   (103 eliminated genes)
## 
##   Level 9:   129 nodes to be scored  (937 eliminated genes)
## 
##   Level 8:   130 nodes to be scored  (2889 eliminated genes)
## 
##   Level 7:   137 nodes to be scored  (5185 eliminated genes)
## 
##   Level 6:   118 nodes to be scored  (8791 eliminated genes)
## 
##   Level 5:   89 nodes to be scored   (10501 eliminated genes)
## 
##   Level 4:   58 nodes to be scored   (12526 eliminated genes)
## 
##   Level 3:   47 nodes to be scored   (13792 eliminated genes)
## 
##   Level 2:   2 nodes to be scored    (14206 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (14336 eliminated genes)

#write.table(ResCCAll$ResAll, file=paste0(SavingFolder, 'TopGO/CCAllResults.txt'), sep='\t', row.names=FALSE)

Cellular Component Enrichment for DOWN-REGULATED genes: 1325 genes

# Wrapper function for topGO analysis (see helper file)
ResCCDownHT <- topGOResults(Genes=GeneVectors$DEGenesDown, gene2GO=CCannHT, ontology='CC', 
                          desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                          EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                          saveRes=TRUE, fileName='CCDownHT', outDir=SavingFolder)
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 1701 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 1923 GO terms and 3234 relations. )
## 
## Annotating nodes ...............
##  ( 14409 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 711 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 13:  3 nodes to be scored    (0 eliminated genes)
## 
##   Level 12:  6 nodes to be scored    (0 eliminated genes)
## 
##   Level 11:  32 nodes to be scored   (52 eliminated genes)
## 
##   Level 10:  74 nodes to be scored   (85 eliminated genes)
## 
##   Level 9:   103 nodes to be scored  (851 eliminated genes)
## 
##   Level 8:   105 nodes to be scored  (2783 eliminated genes)
## 
##   Level 7:   112 nodes to be scored  (5055 eliminated genes)
## 
##   Level 6:   105 nodes to be scored  (8741 eliminated genes)
## 
##   Level 5:   74 nodes to be scored   (10473 eliminated genes)
## 
##   Level 4:   51 nodes to be scored   (12524 eliminated genes)
## 
##   Level 3:   43 nodes to be scored   (13792 eliminated genes)
## 
##   Level 2:   2 nodes to be scored    (14206 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (14336 eliminated genes)


#dir.create(paste0(SavingFolder, 'TopGO/CCDown'), recursive=TRUE)
#GOAnnotation(ResCCDown$ResSel, GOdata=ResCCDown$GOdata, SavingFolder=paste0(SavingFolder, 'TopGO/CCDown'), keytype='SYMBOL')
GOTable(ResCCDownHT$ResSel, maxGO=20)

Cellular Component Analysis for UP-REGULATED genes: 1386 genes

# Wrapper function for topGO analysis (see helper file)
ResCCUpHT <- topGOResults(Genes=GeneVectors$DEGenesUp, gene2GO=CCannHT, ontology='CC', 
                        desc=NULL, nodeSize=5, algorithm='weight01', statistic='fisher', 
                        EnTh=params$GoEnTh, PvalTh=params$GoPvalTh, minTerms=15, geneTh=4,
                        saveRes=TRUE, fileName='CCUpHT', outDir=SavingFolder)
## Gene vector contains levels: 0,1
## 
## Building most specific GOs .....
##  ( 1701 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 1923 GO terms and 3234 relations. )
## 
## Annotating nodes ...............
##  ( 14409 genes annotated to the GO terms. )
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 703 nontrivial nodes
##       parameters: 
##           test statistic: fisher
## 
##   Level 12:  3 nodes to be scored    (0 eliminated genes)
## 
##   Level 11:  31 nodes to be scored   (0 eliminated genes)
## 
##   Level 10:  71 nodes to be scored   (26 eliminated genes)
## 
##   Level 9:   104 nodes to be scored  (842 eliminated genes)
## 
##   Level 8:   108 nodes to be scored  (2665 eliminated genes)
## 
##   Level 7:   114 nodes to be scored  (5049 eliminated genes)
## 
##   Level 6:   100 nodes to be scored  (8647 eliminated genes)
## 
##   Level 5:   81 nodes to be scored   (10460 eliminated genes)
## 
##   Level 4:   48 nodes to be scored   (12496 eliminated genes)
## 
##   Level 3:   40 nodes to be scored   (13787 eliminated genes)
## 
##   Level 2:   2 nodes to be scored    (14203 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (14336 eliminated genes)


#dir.create(paste0(SavingFolder, 'TopGO/CCUp'), recursive=TRUE)
#GOAnnotation(ResCCUp$ResSel, GOdata=ResCCUp$GOdata, SavingFolder=paste0(SavingFolder, 'TopGO/CCUp'), keytype='SYMBOL')
GOTable(ResCCUpHT$ResSel, maxGO=20)

Result visualization: Barplot

topGOBarplotAll(TopGOResAll=ResCCAllHT$ResSel, TopGOResDown=ResCCDownHT$ResSel, TopGOResUp=ResCCUpHT$ResSel,
                terms=12, pvalTh=0.05, plotTitle=NULL, gradient = FALSE, cols = colvector)
## Warning in topGOBarplotAll(TopGOResAll = ResCCAllHT$ResSel, TopGOResDown = ResCCDownHT$ResSel, : Please make sure you specified names of the cols vector correctly.
##                 See `?topGOBarplotAll`


7. Savings

Most of the useful information has been saved during the analysis. Here I save figures, workspace and information about the session.

if (params$SaveImages == TRUE){   #Just in case since eval only works when knitting
  #Set the folder paths
  from <- paste(getwd(), paste(params$NbName, 'files/figure-html', sep='_'), sep='/')
  to <- params$FiguresFolder

  #Copy to output directory
  file.copy(from, to, recursive = TRUE, copy.mode = TRUE)
}
SessionInfo <- sessionInfo()
Date <- date()

save.image(paste0(SavingFolder, '/ipsc.', 'FunctionalAnalysisWorkspace_HT.RData'))

last update on: Fri Jan 10 20:36:30 2025